Source: www.redbubble.com Photo Credit: Nino Marcutti/Alamy Stock

1. Introduction

This is an on-going analysis aimed at tracking the impact of Covid-19 on the City of New York (NYC). Using data provided by City officials, this work provides insights into the daily and total number of COVID infections, hospitalizations and deaths in the City as a whole. Additionally, it looks at COVID cases among various demographics to determine which ones have been impacted the most.

2. Data

The NYC Department of Health and Mental Hygiene (DOHMH) publishes an open source COVID-19 database on its Github repository. The database, which is updated daily, contains numerous data sets that provides details about COVID cases, testing and vaccinations. However, this work only uses three data sets from the repository: data-by-day, data-by-group and data-by-modzcta. Below are brief descriptions of each of the data sets.

  • data-by-day: Provides a daily summary of all Covid cases, hospitalizations and deaths that happened in the City as a whole, and by borough.

  • data-by-group: Provides a breakdown of total number of cases, hospitalizations and death by different demograpics, including borough, age, gender, and race.

  • data-by-modzcta: Gives a breakdown of aggregate cases by neighborhood and modified zip code. This data can be used to map COVID cases and deaths by neighborhood when combined with the MODZCTA shape files (can be downloaded from DOHMH’s Github or NYC Open Data Portal).

Now, let’s extract and load the three COVID data set needed from the DOHMH GitHub page and the shape files from the NYC Open Data Portal.

## [1] TRUE TRUE TRUE TRUE TRUE
## Reading layer `geo_export_4a5a72a8-9e18-49e7-aaeb-2b7633fc740b' from data source `/Users/aly_will_mac/Desktop/OLD PC/WILL/LEARNING/1. ALL PROJECTS/NYC COVID19 in R/Shape Files/geo_export_4a5a72a8-9e18-49e7-aaeb-2b7633fc740b.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 178 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS:  WGS84(DD)

Let’s take a look at the first few rows of each data set.

Data-by-Day

Data-by-Group

Data-by-Modzcta

3. Data Examination

3.1. Data Structure and Summary

A quick look at the structure and summary of the three data sets to determine what needs to be cleaned.

Daily Data

Data Frame Summary

daily

Dimensions: 1043 x 67
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 date_of_interest [character]
1. 01/01/2021
2. 01/01/2022
3. 01/01/2023
4. 01/02/2021
5. 01/02/2022
6. 01/02/2023
7. 01/03/2021
8. 01/03/2022
9. 01/03/2023
10. 01/04/2021
[ 1033 others ]
1(0.1%)
1(0.1%)
1(0.1%)
1(0.1%)
1(0.1%)
1(0.1%)
1(0.1%)
1(0.1%)
1(0.1%)
1(0.1%)
1033(99.0%)
1043 (100.0%) 0 (0.0%)
2 CASE_COUNT [integer]
Mean (sd) : 2537.8 (5117.4)
min ≤ med ≤ max:
0 ≤ 1528 ≤ 54991
IQR (CV) : 2274 (2)
918 distinct values 1043 (100.0%) 0 (0.0%)
3 PROBABLE_CASE_COUNT [integer]
Mean (sd) : 491.8 (641.1)
min ≤ med ≤ max:
0 ≤ 369 ≤ 5882
IQR (CV) : 588.5 (1.3)
650 distinct values 1043 (100.0%) 0 (0.0%)
4 HOSPITALIZED_COUNT [integer]
Mean (sd) : 176 (250.3)
min ≤ med ≤ max:
1 ≤ 103 ≤ 1843
IQR (CV) : 145.5 (1.4)
375 distinct values 1043 (100.0%) 0 (0.0%)
5 DEATH_COUNT [integer]
Mean (sd) : 36.2 (77.8)
min ≤ med ≤ max:
0 ≤ 13 ≤ 598
IQR (CV) : 26 (2.2)
148 distinct values 1043 (100.0%) 0 (0.0%)
6 PROBABLE_DEATH_COUNT [integer]
Mean (sd) : 6.1 (25)
min ≤ med ≤ max:
0 ≤ 1 ≤ 240
IQR (CV) : 2 (4.1)
58 distinct values 1043 (100.0%) 0 (0.0%)
7 CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 2530.6 (4699.6)
min ≤ med ≤ max:
0 ≤ 1571 ≤ 39487
IQR (CV) : 2255 (1.9)
898 distinct values 1043 (100.0%) 0 (0.0%)
8 ALL_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 3019.9 (5257.5)
min ≤ med ≤ max:
0 ≤ 1977 ≤ 43946
IQR (CV) : 2892.5 (1.7)
914 distinct values 1043 (100.0%) 0 (0.0%)
9 HOSP_COUNT_7DAY_AVG [integer]
Mean (sd) : 175.6 (245.9)
min ≤ med ≤ max:
0 ≤ 105 ≤ 1663
IQR (CV) : 144.5 (1.4)
354 distinct values 1043 (100.0%) 0 (0.0%)
10 DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 36.1 (76.9)
min ≤ med ≤ max:
0 ≤ 12 ≤ 566
IQR (CV) : 25 (2.1)
150 distinct values 1043 (100.0%) 0 (0.0%)
11 ALL_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 42.2 (100.5)
min ≤ med ≤ max:
0 ≤ 13 ≤ 775
IQR (CV) : 27 (2.4)
155 distinct values 1043 (100.0%) 0 (0.0%)
12 BX_CASE_COUNT [integer]
Mean (sd) : 418.5 (955.8)
min ≤ med ≤ max:
0 ≤ 209 ≤ 10559
IQR (CV) : 364 (2.3)
556 distinct values 1043 (100.0%) 0 (0.0%)
13 BX_PROBABLE_CASE_COUNT [integer]
Mean (sd) : 96.6 (147.1)
min ≤ med ≤ max:
0 ≤ 64 ≤ 1575
IQR (CV) : 118.5 (1.5)
272 distinct values 1043 (100.0%) 0 (0.0%)
14 BX_HOSPITALIZED_COUNT [integer]
Mean (sd) : 37.9 (57.9)
min ≤ med ≤ max:
0 ≤ 20 ≤ 390
IQR (CV) : 32 (1.5)
155 distinct values 1043 (100.0%) 0 (0.0%)
15 BX_DEATH_COUNT [integer]
Mean (sd) : 6.8 (16.3)
min ≤ med ≤ max:
0 ≤ 2 ≤ 132
IQR (CV) : 4 (2.4)
64 distinct values 1043 (100.0%) 0 (0.0%)
16 BX_PROBABLE_DEATH_COUNT [integer]
Mean (sd) : 1.2 (5.2)
min ≤ med ≤ max:
0 ≤ 0 ≤ 46
IQR (CV) : 0 (4.4)
32 distinct values 1043 (100.0%) 0 (0.0%)
17 BX_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 417.2 (858.7)
min ≤ med ≤ max:
0 ≤ 231 ≤ 7479
IQR (CV) : 382 (2.1)
535 distinct values 1043 (100.0%) 0 (0.0%)
18 BX_PROBABLE_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 96.1 (135.2)
min ≤ med ≤ max:
0 ≤ 71 ≤ 1094
IQR (CV) : 122 (1.4)
251 distinct values 1043 (100.0%) 0 (0.0%)
19 BX_ALL_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 513.3 (985.5)
min ≤ med ≤ max:
0 ≤ 304 ≤ 8573
IQR (CV) : 496.5 (1.9)
599 distinct values 1043 (100.0%) 0 (0.0%)
20 BX_HOSPITALIZED_COUNT_7DAY_AVG [integer]
Mean (sd) : 37.8 (56.5)
min ≤ med ≤ max:
0 ≤ 21 ≤ 358
IQR (CV) : 30 (1.5)
154 distinct values 1043 (100.0%) 0 (0.0%)
21 BX_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 6.9 (16)
min ≤ med ≤ max:
0 ≤ 2 ≤ 117
IQR (CV) : 4 (2.3)
65 distinct values 1043 (100.0%) 0 (0.0%)
22 BX_ALL_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 8 (20.8)
min ≤ med ≤ max:
0 ≤ 2 ≤ 158
IQR (CV) : 4.5 (2.6)
68 distinct values 1043 (100.0%) 0 (0.0%)
23 BK_CASE_COUNT [integer]
Mean (sd) : 764.5 (1509.8)
min ≤ med ≤ max:
0 ≤ 464 ≤ 16660
IQR (CV) : 636 (2)
728 distinct values 1043 (100.0%) 0 (0.0%)
24 BK_PROBABLE_CASE_COUNT [integer]
Mean (sd) : 134.9 (179.1)
min ≤ med ≤ max:
0 ≤ 100 ≤ 1906
IQR (CV) : 152 (1.3)
335 distinct values 1043 (100.0%) 0 (0.0%)
25 BK_HOSPITALIZED_COUNT [integer]
Mean (sd) : 53.4 (73.6)
min ≤ med ≤ max:
0 ≤ 31 ≤ 556
IQR (CV) : 41 (1.4)
187 distinct values 1043 (100.0%) 0 (0.0%)
26 BK_DEATH_COUNT [integer]
Mean (sd) : 11.3 (24)
min ≤ med ≤ max:
0 ≤ 4 ≤ 201
IQR (CV) : 8 (2.1)
80 distinct values 1043 (100.0%) 0 (0.0%)
27 BK_PROBABLE_DEATH_COUNT [integer]
Mean (sd) : 2 (8.6)
min ≤ med ≤ max:
0 ≤ 0 ≤ 92
IQR (CV) : 1 (4.2)
40 distinct values 1043 (100.0%) 0 (0.0%)
28 BK_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 762.5 (1393.3)
min ≤ med ≤ max:
0 ≤ 478 ≤ 11584
IQR (CV) : 637.5 (1.8)
679 distinct values 1043 (100.0%) 0 (0.0%)
29 BK_PROBABLE_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 134.3 (166.3)
min ≤ med ≤ max:
0 ≤ 106 ≤ 1213
IQR (CV) : 149 (1.2)
321 distinct values 1043 (100.0%) 0 (0.0%)
30 BK_ALL_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 896.8 (1548.1)
min ≤ med ≤ max:
0 ≤ 587 ≤ 12784
IQR (CV) : 797 (1.7)
717 distinct values 1043 (100.0%) 0 (0.0%)
31 BK_HOSPITALIZED_COUNT_7DAY_AVG [integer]
Mean (sd) : 53.3 (71.9)
min ≤ med ≤ max:
0 ≤ 32 ≤ 491
IQR (CV) : 37.5 (1.3)
180 distinct values 1043 (100.0%) 0 (0.0%)
32 BK_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 11.3 (23.6)
min ≤ med ≤ max:
0 ≤ 4 ≤ 178
IQR (CV) : 8 (2.1)
83 distinct values 1043 (100.0%) 0 (0.0%)
33 BK_ALL_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 13.3 (31.6)
min ≤ med ≤ max:
0 ≤ 4 ≤ 252
IQR (CV) : 8 (2.4)
88 distinct values 1043 (100.0%) 0 (0.0%)
34 MN_CASE_COUNT [integer]
Mean (sd) : 465.7 (927.9)
min ≤ med ≤ max:
0 ≤ 286 ≤ 9112
IQR (CV) : 391.5 (2)
594 distinct values 1043 (100.0%) 0 (0.0%)
35 MN_PROBABLE_CASE_COUNT [integer]
Mean (sd) : 91.2 (115.9)
min ≤ med ≤ max:
0 ≤ 69 ≤ 972
IQR (CV) : 110 (1.3)
250 distinct values 1043 (100.0%) 0 (0.0%)
36 MN_HOSPITALIZED_COUNT [integer]
Mean (sd) : 26.6 (36.6)
min ≤ med ≤ max:
0 ≤ 16 ≤ 275
IQR (CV) : 24 (1.4)
127 distinct values 1043 (100.0%) 0 (0.0%)
37 MN_DEATH_COUNT [integer]
Mean (sd) : 4.9 (10.1)
min ≤ med ≤ max:
0 ≤ 2 ≤ 92
IQR (CV) : 4 (2.1)
50 distinct values 1043 (100.0%) 0 (0.0%)
38 MN_PROBABLE_DEATH_COUNT [integer]
Mean (sd) : 0.8 (3.3)
min ≤ med ≤ max:
0 ≤ 0 ≤ 33
IQR (CV) : 0 (3.9)
25 distinct values 1043 (100.0%) 0 (0.0%)
39 MN_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 464.4 (846.1)
min ≤ med ≤ max:
0 ≤ 314 ≤ 6394
IQR (CV) : 377 (1.8)
552 distinct values 1043 (100.0%) 0 (0.0%)
40 MN_PROBABLE_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 90.7 (109.3)
min ≤ med ≤ max:
0 ≤ 75 ≤ 766
IQR (CV) : 113.5 (1.2)
235 distinct values 1043 (100.0%) 0 (0.0%)
41 MN_ALL_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 555.2 (948.8)
min ≤ med ≤ max:
0 ≤ 380 ≤ 7160
IQR (CV) : 469 (1.7)
604 distinct values 1043 (100.0%) 0 (0.0%)
42 MN_HOSPITALIZED_COUNT_7DAY_AVG [integer]
Mean (sd) : 26.5 (35.6)
min ≤ med ≤ max:
0 ≤ 17 ≤ 229
IQR (CV) : 25 (1.3)
129 distinct values 1043 (100.0%) 0 (0.0%)
43 MN_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 4.9 (9.8)
min ≤ med ≤ max:
0 ≤ 2 ≤ 73
IQR (CV) : 3 (2)
50 distinct values 1043 (100.0%) 0 (0.0%)
44 MN_ALL_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 5.7 (12.8)
min ≤ med ≤ max:
0 ≤ 2 ≤ 100
IQR (CV) : 4 (2.2)
57 distinct values 1043 (100.0%) 0 (0.0%)
45 QN_CASE_COUNT [integer]
Mean (sd) : 709 (1441.4)
min ≤ med ≤ max:
0 ≤ 405 ≤ 15217
IQR (CV) : 668 (2)
685 distinct values 1043 (100.0%) 0 (0.0%)
46 QN_PROBABLE_CASE_COUNT [integer]
Mean (sd) : 136.2 (173)
min ≤ med ≤ max:
0 ≤ 98 ≤ 1609
IQR (CV) : 176 (1.3)
344 distinct values 1043 (100.0%) 0 (0.0%)
47 QN_HOSPITALIZED_COUNT [integer]
Mean (sd) : 49.5 (77.4)
min ≤ med ≤ max:
0 ≤ 27 ≤ 609
IQR (CV) : 39.5 (1.6)
179 distinct values 1043 (100.0%) 0 (0.0%)
48 QN_DEATH_COUNT [integer]
Mean (sd) : 10.8 (24.6)
min ≤ med ≤ max:
0 ≤ 4 ≤ 202
IQR (CV) : 7 (2.3)
80 distinct values 1043 (100.0%) 0 (0.0%)
49 QN_PROBABLE_DEATH_COUNT [integer]
Mean (sd) : 1.7 (7.4)
min ≤ med ≤ max:
0 ≤ 0 ≤ 68
IQR (CV) : 1 (4.3)
38 distinct values 1043 (100.0%) 0 (0.0%)
50 QN_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 706.8 (1332.8)
min ≤ med ≤ max:
0 ≤ 425 ≤ 11548
IQR (CV) : 684.5 (1.9)
677 distinct values 1043 (100.0%) 0 (0.0%)
51 QN_PROBABLE_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 135.5 (162.4)
min ≤ med ≤ max:
0 ≤ 102 ≤ 1220
IQR (CV) : 180.5 (1.2)
328 distinct values 1043 (100.0%) 0 (0.0%)
52 QN_ALL_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 842.3 (1481)
min ≤ med ≤ max:
0 ≤ 539 ≤ 12685
IQR (CV) : 883 (1.8)
706 distinct values 1043 (100.0%) 0 (0.0%)
53 QN_HOSPITALIZED_COUNT_7DAY_AVG [integer]
Mean (sd) : 49.4 (76)
min ≤ med ≤ max:
0 ≤ 28 ≤ 562
IQR (CV) : 39.5 (1.5)
175 distinct values 1043 (100.0%) 0 (0.0%)
54 QN_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 10.8 (24.2)
min ≤ med ≤ max:
0 ≤ 4 ≤ 177
IQR (CV) : 8 (2.2)
76 distinct values 1043 (100.0%) 0 (0.0%)
55 QN_ALL_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 12.6 (31.1)
min ≤ med ≤ max:
0 ≤ 4 ≤ 240
IQR (CV) : 8 (2.5)
78 distinct values 1043 (100.0%) 0 (0.0%)
56 SI_CASE_COUNT [integer]
Mean (sd) : 179.3 (335)
min ≤ med ≤ max:
0 ≤ 111 ≤ 3719
IQR (CV) : 159 (1.9)
362 distinct values 1043 (100.0%) 0 (0.0%)
57 SI_PROBABLE_CASE_COUNT [integer]
Mean (sd) : 32.8 (36.8)
min ≤ med ≤ max:
0 ≤ 25 ≤ 316
IQR (CV) : 44 (1.1)
123 distinct values 1043 (100.0%) 0 (0.0%)
58 SI_HOSPITALIZED_COUNT [integer]
Mean (sd) : 10.7 (11.9)
min ≤ med ≤ max:
0 ≤ 7 ≤ 83
IQR (CV) : 11 (1.1)
65 distinct values 1043 (100.0%) 0 (0.0%)
59 SI_DEATH_COUNT [integer]
Mean (sd) : 2.3 (3.9)
min ≤ med ≤ max:
0 ≤ 1 ≤ 34
IQR (CV) : 3 (1.7)
28 distinct values 1043 (100.0%) 0 (0.0%)
60 SI_PROBABLE_DEATH_COUNT [integer]
Mean (sd) : 0.3 (1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 9
IQR (CV) : 0 (3.8)
0:908(87.1%)
1:94(9.0%)
2:12(1.2%)
3:7(0.7%)
4:6(0.6%)
5:4(0.4%)
6:3(0.3%)
7:3(0.3%)
8:3(0.3%)
9:3(0.3%)
1043 (100.0%) 0 (0.0%)
61 SI_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 178.8 (307.6)
min ≤ med ≤ max:
0 ≤ 118 ≤ 2686
IQR (CV) : 160.5 (1.7)
352 distinct values 1043 (100.0%) 0 (0.0%)
62 SI_PROBABLE_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 32.6 (34.1)
min ≤ med ≤ max:
0 ≤ 27 ≤ 233
IQR (CV) : 42.5 (1)
118 distinct values 1043 (100.0%) 0 (0.0%)
63 SI_ALL_CASE_COUNT_7DAY_AVG [integer]
Mean (sd) : 211.4 (336.7)
min ≤ med ≤ max:
0 ≤ 148 ≤ 2905
IQR (CV) : 205 (1.6)
406 distinct values 1043 (100.0%) 0 (0.0%)
64 SI_HOSPITALIZED_COUNT_7DAY_AVG [integer]
Mean (sd) : 10.7 (11.4)
min ≤ med ≤ max:
0 ≤ 8 ≤ 72
IQR (CV) : 10 (1.1)
60 distinct values 1043 (100.0%) 0 (0.0%)
65 SI_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 2.2 (3.6)
min ≤ med ≤ max:
0 ≤ 1 ≤ 26
IQR (CV) : 1 (1.6)
24 distinct values 1043 (100.0%) 0 (0.0%)
66 SI_ALL_DEATH_COUNT_7DAY_AVG [integer]
Mean (sd) : 2.5 (4.4)
min ≤ med ≤ max:
0 ≤ 1 ≤ 34
IQR (CV) : 2 (1.8)
30 distinct values 1043 (100.0%) 0 (0.0%)
67 INCOMPLETE [integer]
Min : 0
Mean : 409.1
Max : 60960
0:1036(99.3%)
60960:7(0.7%)
1043 (100.0%) 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2023-01-09

Group Data

Data Frame Summary

group

Dimensions: 27 x 11
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 group [character]
1. Age group
2. Borough
3. Citywide
4. Poverty
5. Race
6. Sex
11(40.7%)
5(18.5%)
1(3.7%)
4(14.8%)
4(14.8%)
2(7.4%)
27 (100.0%) 0 (0.0%)
2 subgroup [character]
1. (Empty string)
2. 0-17
3. 0-4
4. 13-17
5. 18-24
6. 25-34
7. 35-44
8. 45-54
9. 5-12
10. 55-64
[ 17 others ]
1(3.7%)
1(3.7%)
1(3.7%)
1(3.7%)
1(3.7%)
1(3.7%)
1(3.7%)
1(3.7%)
1(3.7%)
1(3.7%)
17(63.0%)
27 (100.0%) 0 (0.0%)
3 CONFIRMED_CASE_RATE [numeric]
Mean (sd) : 30133 (4664.7)
min ≤ med ≤ max:
18687 ≤ 30298 ≤ 39266.1
IQR (CV) : 5167.4 (0.2)
26 distinct values 26 (96.3%) 1 (3.7%)
4 CASE_RATE [numeric]
Mean (sd) : 35972 (5666)
min ≤ med ≤ max:
22194 ≤ 36267.8 ≤ 46453.9
IQR (CV) : 6337.1 (0.2)
26 distinct values 26 (96.3%) 1 (3.7%)
5 HOSPITALIZED_RATE [numeric]
Mean (sd) : 2402.5 (1963)
min ≤ med ≤ max:
238.1 ≤ 2324.7 ≤ 10440.3
IQR (CV) : 1135.7 (0.8)
26 distinct values 26 (96.3%) 1 (3.7%)
6 DEATH_RATE [numeric]
Mean (sd) : 651.5 (814.9)
min ≤ med ≤ max:
2.5 ≤ 536.8 ≤ 4184.2
IQR (CV) : 246.3 (1.3)
24 distinct values 24 (88.9%) 3 (11.1%)
7 CONFIRMED_CASE_COUNT [integer]
Mean (sd) : 585428.3 (538713.1)
min ≤ med ≤ max:
97867 ≤ 429007.5 ≤ 2646974
IQR (CV) : 422819.5 (0.9)
26 distinct values 26 (96.3%) 1 (3.7%)
8 PROBABLE_CASE_COUNT [integer]
Mean (sd) : 113484.4 (104616.7)
min ≤ med ≤ max:
18367 ≤ 87584 ≤ 512925
IQR (CV) : 83416.5 (0.9)
26 distinct values 26 (96.3%) 1 (3.7%)
9 CASE_COUNT [integer]
Mean (sd) : 698912.8 (643116.2)
min ≤ med ≤ max:
116234 ≤ 519248.5 ≤ 3159899
IQR (CV) : 508191.8 (0.9)
26 distinct values 26 (96.3%) 1 (3.7%)
10 HOSPITALIZED_COUNT [integer]
Mean (sd) : 45459.3 (42128.6)
min ≤ med ≤ max:
1645 ≤ 37145.5 ≤ 200978
IQR (CV) : 40619.2 (0.9)
26 distinct values 26 (96.3%) 1 (3.7%)
11 DEATH_COUNT [integer]
Mean (sd) : 12074.8 (10565.9)
min ≤ med ≤ max:
43 ≤ 9280.5 ≤ 44127
IQR (CV) : 14226 (0.9)
24 distinct values 24 (88.9%) 3 (11.1%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2023-01-09

Modzcta Data

Data Frame Summary

modzcta

Dimensions: 177 x 19
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 MODIFIED_ZCTA [integer]
Mean (sd) : 10810.4 (578.2)
min ≤ med ≤ max:
10001 ≤ 11109 ≤ 11697
IQR (CV) : 1060 (0.1)
177 distinct values 177 (100.0%) 0 (0.0%)
2 NEIGHBORHOOD_NAME [character]
1. Financial District
2. Hell's Kitchen/Midtown Ma
3. Lenox Hill/Upper East Sid
4. Battery Park City
5. Cypress Hills/East New Yo
6. Douglaston-Little Neck
7. East Harlem
8. Lincoln Square
9. Ozone Park
10. Queens Village
[ 152 others ]
3(1.7%)
3(1.7%)
3(1.7%)
2(1.1%)
2(1.1%)
2(1.1%)
2(1.1%)
2(1.1%)
2(1.1%)
2(1.1%)
154(87.0%)
177 (100.0%) 0 (0.0%)
3 BOROUGH_GROUP [character]
1. Bronx
2. Brooklyn
3. Manhattan
4. Queens
5. Staten Island
25(14.1%)
37(20.9%)
44(24.9%)
59(33.3%)
12(6.8%)
177 (100.0%) 0 (0.0%)
4 label [character]
1. 10001, 10118
2. 10002
3. 10003
4. 10004
5. 10005
6. 10006
7. 10007
8. 10009
9. 10010
10. 10011
[ 167 others ]
1(0.6%)
1(0.6%)
1(0.6%)
1(0.6%)
1(0.6%)
1(0.6%)
1(0.6%)
1(0.6%)
1(0.6%)
1(0.6%)
167(94.4%)
177 (100.0%) 0 (0.0%)
5 lat [numeric]
Mean (sd) : 40.7 (0.1)
min ≤ med ≤ max:
40.5 ≤ 40.7 ≤ 40.9
IQR (CV) : 0.1 (0)
177 distinct values 177 (100.0%) 0 (0.0%)
6 lon [numeric]
Mean (sd) : -73.9 (0.1)
min ≤ med ≤ max:
-74.2 ≤ -73.9 ≤ -73.7
IQR (CV) : 0.1 (0)
177 distinct values 177 (100.0%) 0 (0.0%)
7 COVID_CONFIRMED_CASE_COUNT [integer]
Mean (sd) : 14346.8 (8085.1)
min ≤ med ≤ max:
1030 ≤ 13389 ≤ 33655
IQR (CV) : 11804 (0.6)
176 distinct values 177 (100.0%) 0 (0.0%)
8 COVID_PROBABLE_CASE_COUNT [integer]
Mean (sd) : 2815 (1512.3)
min ≤ med ≤ max:
206 ≤ 2576 ≤ 6900
IQR (CV) : 2151 (0.5)
174 distinct values 177 (100.0%) 0 (0.0%)
9 COVID_CASE_COUNT [integer]
Mean (sd) : 17161.9 (9479.9)
min ≤ med ≤ max:
1281 ≤ 15817 ≤ 39439
IQR (CV) : 13822 (0.6)
177 distinct values 177 (100.0%) 0 (0.0%)
10 COVID_CONFIRMED_CASE_RATE [numeric]
Mean (sd) : 30583.5 (4598.6)
min ≤ med ≤ max:
18788.3 ≤ 29941.3 ≤ 47911
IQR (CV) : 5504 (0.2)
177 distinct values 177 (100.0%) 0 (0.0%)
11 COVID_CASE_RATE [numeric]
Mean (sd) : 36783.4 (5115.4)
min ≤ med ≤ max:
23995.2 ≤ 36025 ≤ 58060.8
IQR (CV) : 5973.9 (0.1)
177 distinct values 177 (100.0%) 0 (0.0%)
12 POP_DENOMINATOR [numeric]
Mean (sd) : 47100.7 (26151.6)
min ≤ med ≤ max:
2972.1 ≤ 42737.3 ≤ 110369.8
IQR (CV) : 39675.5 (0.6)
177 distinct values 177 (100.0%) 0 (0.0%)
13 COVID_CONFIRMED_DEATH_COUNT [integer]
Mean (sd) : 209.1 (150.8)
min ≤ med ≤ max:
0 ≤ 168 ≤ 768
IQR (CV) : 220 (0.7)
139 distinct values 177 (100.0%) 0 (0.0%)
14 COVID_PROBABLE_DEATH_COUNT [integer]
Mean (sd) : 35.1 (26.6)
min ≤ med ≤ max:
0 ≤ 28 ≤ 109
IQR (CV) : 33 (0.8)
75 distinct values 177 (100.0%) 0 (0.0%)
15 COVID_DEATH_COUNT [integer]
Mean (sd) : 244.2 (175.3)
min ≤ med ≤ max:
1 ≤ 198 ≤ 870
IQR (CV) : 252 (0.7)
148 distinct values 177 (100.0%) 0 (0.0%)
16 COVID_CONFIRMED_DEATH_RATE [numeric]
Mean (sd) : 420.6 (186.5)
min ≤ med ≤ max:
0 ≤ 416.6 ≤ 1290.1
IQR (CV) : 192.2 (0.4)
177 distinct values 177 (100.0%) 0 (0.0%)
17 COVID_DEATH_RATE [numeric]
Mean (sd) : 491.3 (217.5)
min ≤ med ≤ max:
11.4 ≤ 485.1 ≤ 1505.1
IQR (CV) : 233 (0.4)
177 distinct values 177 (100.0%) 0 (0.0%)
18 PERCENT_POSITIVE [numeric]
Mean (sd) : 24.8 (4.6)
min ≤ med ≤ max:
7.9 ≤ 25.3 ≤ 36.3
IQR (CV) : 4.6 (0.2)
162 distinct values 177 (100.0%) 0 (0.0%)
19 TOTAL_COVID_TESTS [integer]
Mean (sd) : 53480.7 (29179.4)
min ≤ med ≤ max:
4224 ≤ 48572 ≤ 128535
IQR (CV) : 44379 (0.5)
177 distinct values 177 (100.0%) 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2023-01-09

3.2. Missing Data

From the data summary tables, only the ‘group’ data has missing values. Let’s check again to make sure.

Daily Data

Group Data

Modzcta Data

Only 4.4% of the group data are missing, all of which are coming from the age category. In Section 4.1, I re-code the sub-categories under the age group and that addresses the missing data issue.

4. Data Wrangling

4.1. Consolidate ‘0-17’ age group

Under the Age group category, the 0-17 group has three sub-groupings (0-4, 5-12, 12-17). However, the DEATH_RATE & DEATH_COUNT statistics are only provided for the 0-17 age group. Besides DEATH_RATE & DEATH_COUNT, the other Covid statistics are only provided for the age sub-categories and not the main 0-17 category. This creates missing values in the rows containing the age categories as shown below.

Let’s clean this up by merging all the numbers under the main category (0-17) and remove the three sub-categories.

First let’s save the sub-categories under “0-17” - we may need them later.

Now, use the rollsumr function to add up the three subcategories into the the 0-17 subgroup.

After that, delete the three subcategories.

Now, check to see if the re-coding took care of the missing values in the group data.

4.2. Clean the Staten Island subgroup

In the group table, Staten Island is written as StatenIsland as shown in the table below.

Borough Count
Bronx 1
Brooklyn 1
Manhattan 1
Queens 1
StatenIsland 1

Let’s add a white space to separate the two words and check to make sure it worked.

Borough Count
Bronx 1
Brooklyn 1
Manhattan 1
Queens 1
Staten Island 1

4.3. Change data types for date_of_interest

In the daily data, the date_of_interest column is a string variable. Change it to date.

## [1] "Date"

5. Analyzing Citywide Impact

This section analyzes and summarizes COVID cases throughout the City.

5.1. Citywide: Total Cases

Let’s look at the total number of cases, hospitalizations and deaths throughout the five boroughs.

Total Infections Total Hospitalizations Total Deaths
3,159,899 200,978 44,127

The charts below show the trends in daily Citywide cases since the beginning of the pandemic.

Infections

Hospitalizations

Deaths

5.2. Citywide: New Cases

The table below shows the new infections, hospitalizations and deaths recorded on 2023-01-06 - the latest date we have record for.

Date Infections Hospitalizations Deaths
2023-01-06 1,957 9 0

6. Analyzing Covid Impact by Borough

This section breaks down the daily and total COVID cases by borough.

6.1. Total Cases by Borough

The chart below shows the total number of COVID cases by borough. It gives the raw numbers of infections, hospitalizations and deaths since the beginning of the pandemic.

Because these are the raw numbers (and not case count), densely populated boroughs are likely to have higher number of cases.

6.2. Daily Average Cases by Borough

The charts below show the trends in the daily average infections, hospitalizations and deaths per borough.

It appears that Brooklyn and Queens have consistently posted the highest daily average cases.

Average Infections by Borough

Average Hospitalizations by Borough

Average Deaths by Borough

6.3. Which borough saw the largest share of cases turn into hospitalizations and deaths?

The chart shows total hospitalizations and deaths as percent of total infections for each borough.

The table indicates that, even though Brooklyn has the highest number of COVID cases (see section 6.1), the Bronx has seen the largest share of their cases lead to hospitalizations and deaths. This may be because, while less populated than Brooklyn, the Bronx has a lot more people with underlying medical conditions that exacerbate the effects of COVID. For example, The Bronx is known to have the highest asthma hospitalization rate in the New York State.1

6.4. Which boroughs have been hit the hardest?

Section 6.1 shows Brooklyn has the highest number of cases, hospitalizations and deaths among all boroughs. This makes sense since Brooklyn is the most populous of the five boroughs. However, to be able to compare boroughs to determine which one is mostly affected, we have to adjust for population. Hence, we use the rates (per 100,000) statistics.

Below are the infection, hospitalization and death rates (per 100,000 people) for each borough.

As we can see, even though Brooklyn has the highest case count (see Section 6.1), Staten Island has the highest number of cases when you adjust for population.

Also, while Brooklyn has the highest count of hospitalizations and deaths, the Bronx has the highest rate of hospitalizations and deaths after adjusting for population.

7. Analysing by Age Group

This section details how COVID has affected the various age groups in the City. The data set breaks age down into eight categories - 0-17, 18-24, 24-34, 35-44, 45-54, 55-64, 65-74, and 75+.

7.1. Case, Hospitalization and Death Rates

The first tab shows the rate (per 100,000) of cases, hospitalizations and deaths for the various age groups. The second tab shows hospitalization and death rates as a share of case rates.

Case, Hospitalization and Death Rates

Share of Cases that Lead to Hospitalization or Death

The two tables indicate that, while young people (under 45 years) are infected at a higher rate than any other age group, only a small share are hospitalized and they barely any die from the virus. However, seniors, especially those 75 year and over, tend to be hospitalized and die at the highest rate even though they have the lowest infection rates. This is consistent with reports that COVID is much more deadly among seniors.

8. Analysis by Race/Ethnicity

This section details how COVID has affected people of different races and ethnicities in the City. The dataset breaks race/ethnicity into eight categories - Asian/Pacific-Islander, Black/African-American, Hispani/Latino and White.

Case, Hospitalization and Death Rates

The first tab shows the infections, hospitalizations and deaths rates (per 100,000) for each race/ethnicity.

The second tab shows hospitalization and death rates as a share of case rates.

Case, Hospitalization and Death Rates

Share of Cases that Cause Hospitalization or Death

The two charts indicate that, while African-Americans have one of the lowest infections rates, they tend to be hospitalized or die from the virus at the highest rates.

9. Map: COVID-19 Cases by Neighborhood

In this section, I create some choropleth maps to visualize the infection and death rates (per 100K) for each neighborhood.

First, I merge the modzcta shapefile and data frame (into the object modzcta_merge) and use that for the mapping.

Infection Rate (PER 100K) by Neighborhood

Death Rate (PER 100K) by Neighborhood

10. Conclusion

The following are the takeaways as of 2023-01-06.

  • Infections peaked in January 2022, during the Omicron wave.

  • However, hospitalizations and deaths reached their peaks during the first wave of the pandemic (April 2020). Because of the availability of vaccines, the Omicron wave did not cause as much hospitalization and was not as deadly as the 2020 wave of infections.

  • Because of the size of its population, Brooklyn has seen the highest number of infections, hospitalizations and deaths since the beginning of the pandemic compared to the other boroughs.

    • However, after adjusting for population, Staten Island has had the highest rate of infection (per 100K people), while the Bronx has had the highest hospitalization and death rates.
    • The Bronx has seen the largest share of all cases lead to hospitalization (7.9 percent) and death (1.5 percent).
  • In terms of age, young people under 45 years have the highest rate of infection. Yet, seniors over 65 years tend to be hospitalized and die at higher rates.

    • Those over 76 years have seen the largest share of their cases lead to hospitalization (34.6 percent) and deaths (14.1 percent).
  • Even though African-Americans have one of the lowest infection rates, they tend to be hospitalized and die at higher rates compared other races/ethnicities.

    • African-Americans have seen the largest share of their cases lead to hospitalization (9.6 percent) and deaths (3.4 percent).